Fix mlnet performance benchmark timeouts by pre-provisioning the SSWE model into the Helix payload by LoopedBard3 · Pull Request #5250 · dotnet/performance

LoopedBard3 · 2026-06-30T21:45:57Z

Problem

Every ML.NET performance benchmark in the performance-ci pipeline (public definitionId=38) is timing out. The whole mlnet work item hangs and is eventually killed at the work item timeout, discarding all ML.NET results, so every mlnet test shows as failed.

Example failing run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1478345

Root cause

The culprit is StochasticDualCoordinateAscentClassifierBench.TrainSentiment, which applies a pretrained SSWE word embedding. ML.NET downloads that model (sentiment.emd, ~70 MB) from aka.ms/mlnet-resources at benchmark runtime if it isn't already on disk. On the Helix machines that download now stalls (a hung connection, not a fast failure), so the benchmark hangs at // BeforeActualRun and the work item runs until it times out.

I reproduced this locally: with the model download blocked (stalling proxy) and the cache cleared, TrainSentiment hangs at // BeforeActualRun exactly like the Helix log; pre-provisioning the model + setting MICROSOFTML_RESOURCE_PATH runs it to completion with zero network. Updating Microsoft.ML (tested 5.0.0) does not help — the runtime download persists in all versions.

Fix

In scripts/run_performance_job.py, for run_kind == "mlnet" only:

Download the SSWE model on the build agent (which has reliable connectivity) into the correlation payload at mlnet-resources/Text/Sswe/sentiment.emd, with retries and a blob→aka.ms fallback.
Set MICROSOFTML_RESOURCE_PATH to that payload dir via the Helix pre-commands, so ML.NET loads the embedding from disk and never makes the network call.

This is best-effort and strictly gated on run_kind == "mlnet": non-mlnet runs are unaffected, and if the agent download fails it logs a warning and falls back to today's behavior.

Why now?

The benchmark hasn't changed since 2019 and this isn't caused by any PR — the trigger is environmental on the download path (Helix egress/proxy/TLS tightening, blob throttling, and/or a .NET 9+ HttpClient behavior change against this endpoint — .NET 8 sometimes completed while 9.0/main/ubuntu hung). It manifests as a multi-hour timeout because it's a stalled read rather than a fast failure. Pre-staging the asset removes the dependency regardless of which is the actual culprit.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The mlnet performance benchmarks (StochasticDualCoordinateAscentClassifierBench.TrainSentiment) apply a pretrained SSWE word embedding that ML.NET downloads (~70 MB) from aka.ms/mlnet-resources at benchmark runtime. That download stalls on the Helix machines, hanging the entire mlnet work item until it times out and is killed, discarding all mlnet results so every mlnet benchmark appears to fail. Download the model on the build agent (reliable connectivity) into the correlation payload and point MICROSOFTML_RESOURCE_PATH at it via the Helix pre-commands, removing the runtime network dependency. Best-effort and strictly gated on run_kind == mlnet, so non-mlnet runs are unaffected and a download failure falls back to prior behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR addresses persistent ML.NET benchmark work item timeouts in the performance-ci Helix pipeline by removing a runtime network dependency (ML.NET’s SSWE embedding download) and instead pre-staging the model inside the Helix correlation payload.

Changes:

Added a best-effort pre-provisioning step that downloads sentiment.emd (SSWE embedding) into the correlation payload for run_kind == "mlnet".
Wired Helix pre-commands to set MICROSOFTML_RESOURCE_PATH to the staged payload directory when provisioning succeeds.
Added retry + fallback URL logic for the model download.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Download to a temp file, validate the size against Content-Length (when present) and a minimum-size floor, then atomically replace the destination so a truncated or early-closed response can't leave a corrupt sentiment.emd in the payload. Also make the function docstring more concise. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

When the server sends a Content-Length, an exact match fully validates the download, so trust it regardless of size (the asset could legitimately shrink without becoming invalid). Only fall back to the minimum-size floor when no Content-Length is available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 30, 2026 21:45

Copilot started reviewing on behalf of LoopedBard3 June 30, 2026 21:46 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Comment thread scripts/run_performance_job.py

LoopedBard3 requested review from DrewScoggins and caaavik-msft July 1, 2026 17:36

LoopedBard3 marked this pull request as ready for review July 1, 2026 17:37

Copilot AI review requested due to automatic review settings July 1, 2026 17:37

Copilot started reviewing on behalf of LoopedBard3 July 1, 2026 17:38 View session

Copilot AI reviewed Jul 1, 2026

View reviewed changes

Comment thread scripts/run_performance_job.py

LoopedBard3 self-assigned this Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix mlnet performance benchmark timeouts by pre-provisioning the SSWE model into the Helix payload#5250

Fix mlnet performance benchmark timeouts by pre-provisioning the SSWE model into the Helix payload#5250
LoopedBard3 wants to merge 3 commits into
dotnet:mainfrom
LoopedBard3:loopedbard3/didactic-umbrella

LoopedBard3 commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

LoopedBard3 commented Jun 30, 2026

Problem

Root cause

Fix

Why now?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants